51 research outputs found

    Model selection and sensitivity analysis for sequence pattern models

    Full text link
    In this article we propose a maximal a posteriori (MAP) criterion for model selection in the motif discovery problem and investigate conditions under which the MAP asymptotically gives a correct prediction of model size. We also investigate robustness of the MAP to prior specification and provide guidelines for choosing prior hyper-parameters for motif models based on sensitivity considerations.Comment: Published in at http://dx.doi.org/10.1214/193940307000000301 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    The impact of prior information on estimates of disease transmissibility using Bayesian tools

    Get PDF
    The basic reproductive number (Râ‚€) and the distribution of the serial interval (SI) are often used to quantify transmission during an infectious disease outbreak. In this paper, we present estimates of Râ‚€ and SI from the 2003 SARS outbreak in Hong Kong and Singapore, and the 2009 pandemic influenza A(H1N1) outbreak in South Africa using methods that expand upon an existing Bayesian framework. This expanded framework allows for the incorporation of additional information, such as contact tracing or household data, through prior distributions. The results for the Râ‚€ and the SI from the influenza outbreak in South Africa were similar regardless of the prior information (R0 = 1.36-1.46, ÎĽ = 2.0-2.7, ÎĽ = mean of the SI). The estimates of Râ‚€ and ÎĽ for the SARS outbreak ranged from 2.0-4.4 and 7.4-11.3, respectively, and were shown to vary depending on the use of contact tracing data. The impact of the contact tracing data was likely due to the small number of SARS cases relative to the size of the contact tracing sample

    A New Framework for Distance-based Functional Clustering

    Get PDF
    We develop a new framework for clustering functional data, based on a distance matrix similar to the approach in clustering multivariate data using spectral clustering. First, we smooth the raw observations using appropriate smoothing techniques with desired smoothness, through a penalized fit. The next step is to create an optimal distance matrix either from the smoothed curves or their available derivatives. The choice of the distance matrix depends on the nature of the data. Finally, we create and implement the spectral clustering algorithm. We applied our newly developed approach, Functional Spectral Clustering (FSC) on sets of simulated and real data. Our proposed method showed better performance than existing methods with respect to accuracy rates

    A New Framework for Distance-based Functional Clustering

    Get PDF
    We develop a new framework for clustering functional data, based on a distance matrix similar to the approach in clustering multivariate data using spectral clustering. First, we smooth the raw observations using appropriate smoothing techniques with desired smoothness, through a penalized fit. The next step is to create an optimal distance matrix either from the smoothed curves or their available derivatives. The choice of the distance matrix depends on the nature of the data. Finally, we create and implement the spectral clustering algorithm. We applied our newly developed approach, Functional Spectral Clustering (FSC) on sets of simulated and real data. Our proposed method showed better performance than existing methods with respect to accuracy rates

    Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions

    Get PDF
    Model-based clustering approaches generally assume that the observations to be clustered are generated from a mixture of distributions, each component of the mixture corresponding to a particular parametric distribution. Most commonly, the underlying distribution is assumed to be normal, which is inadequate for many situations, for example when skewness or multimodality is present within the components. The problem is intensified when the data dimension increases, leading to inaccurate groupings and incorrect inference. A new Bayesian model-based clustering approach is proposed, that can handle a variety of complexities in the data, based on a recently introduced family of geometric skew normal distributions. The performance of this methodology is illustrated through a number of simulation studies and applications to a number of datasets from genomics and medicine

    The IMS New Researchers\u27 Survival Guide

    Get PDF
    Statistics is a wonderfully diverse profession and graduate students making career choices have many options — especially in light of the dearth of students moving into the statistical sciences today. The three main career paths at the PhD level are in academics, industry/business and government. Each of these job types offers its own mix of intellectual challenges, financial reward, pressure and security. How a new researcher selects (or is selected by) a specific occupation in the statistical sciences sometimes seems more a function of luck than of conscious decision making. This consideration was one of the first concerns addressed by the New Researchers Committee (NRC) of the Institute of Mathematical Statistics in 1988, and this guide is the product of that (and later) thinking. We believe that if students were better informed about their choices, they would be less apprehensive, pursue their goals more effectively and, ultimately, be far more likely to find positions for which they are well suited. Similarly, if doctoral students were generally more familiar with various aspects of professional life, the entire statistical community would benefit. Among the transitional facts of life with which we believe new researchers should be acquainted are: 1. mechanisms for applying for jobs, 2. expectations associated with different types of jobs, 3. techniques for initiating an active research program, and 4. methods of becoming more involved with the broader statistical community. The Survival Guide addresses these issues, but it also offers advice on a variety of other topics which new researchers may wish to consider as they prepare to leave graduate school. This guide is based on the Statistical Science article by the New Researchers Committee of IMS (1991). See Kruse (2002) on inspiration for statistics as a career path and Stasny (2001) on the big picture with respect to academic jobs. DeMets et al (1998) and Shettle and Gaddy (1998) provide job outlooks for statisticians

    Differential expression analysis with global network adjustment

    Get PDF
    <p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p> <p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p> <p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

    Associations of NINJ2 sequence variants with incident ischemic stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium

    Get PDF
    Background<p></p> Stroke, the leading neurologic cause of death and disability, has a substantial genetic component. We previously conducted a genome-wide association study (GWAS) in four prospective studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium and demonstrated that sequence variants near the NINJ2 gene are associated with incident ischemic stroke. Here, we sought to fine-map functional variants in the region and evaluate the contribution of rare variants to ischemic stroke risk.<p></p> Methods and Results<p></p> We sequenced 196 kb around NINJ2 on chromosome 12p13 among 3,986 European ancestry participants, including 475 ischemic stroke cases, from the Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and Framingham Heart Study. Meta-analyses of single-variant tests for 425 common variants (minor allele frequency [MAF] ≥ 1%) confirmed the original GWAS results and identified an independent intronic variant, rs34166160 (MAF = 0.012), most significantly associated with incident ischemic stroke (HR = 1.80, p = 0.0003). Aggregating 278 putatively-functional variants with MAF≤ 1% using count statistics, we observed a nominally statistically significant association, with the burden of rare NINJ2 variants contributing to decreased ischemic stroke incidence (HR = 0.81; p = 0.026).<p></p> Conclusion<p></p> Common and rare variants in the NINJ2 region were nominally associated with incident ischemic stroke among a subset of CHARGE participants. Allelic heterogeneity at this locus, caused by multiple rare, low frequency, and common variants with disparate effects on risk, may explain the difficulties in replicating the original GWAS results. Additional studies that take into account the complex allelic architecture at this locus are needed to confirm these findings

    Accumulation of M1dG DNA adducts after chronic exposure to PCBs, but not from acute exposure to polychlorinated aromatic hydrocarbons

    Get PDF
    Oxidative DNA damage is one of the key events thought to be involved in mutation and cancer. The present study examined the accumulation of M1dG, 3-(2′-deoxy-β-D-erythro-pentofuranosyl)-pyrimido[1,2-a]-purin-10(3H)-one, DNA adducts after single dose or one-year exposure to polyhalogenated aromatic hydrocarbons (PHAH) in order to evaluate the potential role of oxidative DNA damage in PHAH toxicity and carcinogenicity. The effect of PHAH exposure on the number of M1dG adducts was explored initially in female mice exposed to a single dose of either 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) or a PHAH mixture. This study demonstrated that a single exposure to PHAH had no significant effect on the number of M1dG adducts compared to the corn oil control group. The role of M1dG adducts in polychlorinated biphenyl (PCB) induced toxicity and carcinogenicity was further investigated in rats exposed for a year to PCB 153, PCB 126, or a mixture of the two. PCB 153, at doses up to 3000 μg/kg/d, had no significant effect on the number of M1dG adducts in liver and brain tissues from the exposed rats compared to controls. However, 1000 ng/kg/d of PCB 126 resulted in M1dG adduct accumulation in the liver. More importantly, co-administration of equal proportions of PCB 153 and PCB 126 resulted in dose-dependent increases in M1dG adduct accumulation in the liver from 300-1000 ng/kg/d of PCB 126 with 300-1000 μg/kg/d of PCB 153. Interestingly, the co-administration of different amounts of PCB 153 with fixed amounts of PCB 126 demonstrated more M1dG adduct accumulation with higher doses of PCB 153. These results are consistent with the results from cancer bioassays that demonstrated a synergistic effect between PCB 126 and PCB 153 on toxicity and tumor development. In summary, the results from the present study support the hypothesis that oxidative DNA damage plays a key role in toxicity and carcinogenicity following long-term PCB exposure

    Associations of NINJ2 sequence variants with incident ischemic stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium

    Get PDF
    Background: Stroke, the leading neurologic cause of death and disability, has a substantial genetic component. We previously conducted a genome-wide association study (GWAS) in four prospective studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium and demonstrated that sequence variants near the NINJ2 gene are associated with incident ischemic stroke. Here, we sought to fine-map functional variants in the region and evaluate the contribution of rare variants to ischemic stroke risk. Methods and Results: We sequenced 196 kb around NINJ2 on chromosome 12p13 among 3,986 European ancestry participants, including 475 ischemic stroke cases, from the Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and Framingham Heart Study. Meta-analyses of single-variant tests for 425 common variants (minor allele frequency [MAF] ≥ 1%) confirmed the original GWAS results and identified an independent intronic variant, rs34166160 (MAF = 0.012), most significantly associated with incident ischemic stroke (HR = 1.80, p = 0.0003). Aggregating 278 putatively-functional variants with MAF≤ 1% using count statistics, we observed a nominally statistically significant association, with the burden of rare NINJ2 variants contributing to decreased ischemic stroke incidence (HR = 0.81; p = 0.026). Conclusion: Common and rare variants in the NINJ2 region were nominally associated with incident ischemic stroke among a subset of CHARGE participants. Allelic heterogeneity at this locus, caused by multiple rare, low frequency, and common variants with disparate effects on risk, may explain the difficulties in replicating the original GWAS results. Additional studies that take into account the complex allelic architecture at this locus are needed to confirm these findings
    • …
    corecore